Silhouette (clustering) について

Words near each other

・ Silgerd
・ Silghat
・ Silginy
・ Silgo
・ Silgonda rufipes
・ Silha River
・ Silhac
・ Silhadi
・ Silhak
・ Silhan Özçelik
・ Silhillians Rugby Union Football Club
・ Silhoasa River
・ Silhouette
・ Silhouette (album)
・ Silhouette (band)
・ Silhouette (clustering)
・ Silhouette (comics)
・ Silhouette (disambiguation)
・ Silhouette (eyewear)
・ Silhouette (Kenny G instrumental)
・ Silhouette (lingerie)
・ Silhouette animation
・ Silhouette edge
・ Silhouette in Red
・ Silhouette in Red Tour
・ Silhouette Island
・ Silhouette Mirage
・ Silhouette racing car
・ Silhouette rule
・ Silhouette sign

Dictionary Lists

mini英和辞書

翻訳と辞書　辞書検索 [ 開発暫定版 ]

スポンサードリンク

Silhouette (clustering) ：ウィキペディア英語版

Silhouette (clustering)
Silhouette refers to a method of interpretation and validation of consistency within clusters of data. The technique provides a succinct graphical representation of how well each object lies within its cluster. It was first described by Peter J. Rousseeuw in 1986.
== Definition ==

Assume the data have been clustered via any technique, such as k-means, into

k

clusters. For each datum

i

, let

a(i)

be the average dissimilarity of

i

with all other data within the same cluster. We can interpret

a(i)

as how well

i

is assigned to its cluster (the smaller the value, the better the assignment). We then define the average dissimilarity of point

i

to a cluster

c

as the average of the distance from

i

to points in

c

.
Let

b(i)

be the lowest average dissimilarity of

i

to any other cluster, of which

i

is not a member. The cluster with this lowest average dissimilarity is said to be the "neighbouring cluster" of

i

because it is the next best fit cluster for point

i

.
We now define a silhouette:
:

s(i) = \frac  1-a(i)/b(i), & \mbox a(i) < b(i) \\  0,  & \mbox a(i) = b(i) \\  b(i)/a(i)-1, & \mbox a(i) > b(i) \\\end

From the above definition it is clear that
:

-1 \le s(i) \le 1

For

s(i)

to be close to 1 we require

a(i) \ll b(i)

. As

a(i)

is a measure of how dissimilar

i

is to its own cluster, a small value means it is well matched. Furthermore, a large

b(i)

implies that

i

is badly matched to its neighbouring cluster. Thus an

s(i)

close to one means that the datum is appropriately clustered.
If

s(i)

is close to negative one, then by the same logic we see that

i

would be more appropriate if it was clustered in its neighbouring cluster. An

s(i)

near zero means that the datum is on the border of two natural clusters.
The average

s(i)

over all data of a cluster is a measure of how tightly grouped all the data in the cluster are. Thus the average

s(i)

over all data of the entire dataset is a measure of how appropriately the data has been clustered. If there are too many or too few clusters, as may occur when a poor choice of

k

is used in the k-means algorithm, some of the clusters will typically display much narrower silhouettes than the rest. Thus silhouette plots and averages may be used to determine the natural number of clusters within a dataset. One can also increase the likelihood of the silhouette being maximized at the correct number of clusters by re-scaling the data using feature weights that are cluster specific.

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「Silhouette (clustering)」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース